Identifying Synonymous Expressions From A Bilingual Corpus For Example-Based Machine Translation
نویسندگان
چکیده
Example-based machine translation (EBMT) is based on a bilingual corpus. In EBMT, sentences similar to an input sentence are retrieved from a bilingual corpus and then output is generated from translations of similar sentences. Therefore, a similarity measure between the input sentence and each sentence in the bilingual corpus is important for EBMT. If some similar sentences are missed from retrieval, the quality of translations drops. In this paper, we describe a method to acquire synonymous expressions from a bilingual corpus and utilize them to expand retrieval of similar sentences. Synonymous expressions are acquired from dier-ences in synonymous sentences. Synonymous sentences are clustered by the equivalence of translations. Our method has the advantage of not relying on rich linguistic knowledge, such as sentence structure and dictionaries. We demonstrate the eect on applying our method to a simple EBMT.
منابع مشابه
Identifying bilingual Multi-Word Expressions for Statistical Machine Translation
MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a French-English parallel corpus. In addition we introduce three methods aiming to integrate extracted bilingual MWES in MOSES, a phrase based Statistical Machine...
متن کاملBuilding Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System
We describe in this paper a hybrid approach to build automatically bilingual lexicons of Multiword Expressions (MWEs) from parallel corpora. We more specifically investigate the impact of using a domain-specific bilingual lexicon of MWEs on domain adaptation of an Example-Based Machine Translation (EBMT) system. We conducted experiments on the English-French language pair and two kinds of texts...
متن کاملSynonymous Collocation Extraction Using Translation Information
Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. I...
متن کاملAutomatic Construction of Translation Knowledge for Corpus-based Machine Translation
Many machine translation (MT) systems that utilize the knowledge automatically acquired from bilingual corpora have been proposed in conjunction with efforts to accumulate corpora. We call this approach corpus-based machine translation in this thesis. This thesis focuses on automatic construction of the translation knowledge needed for corpus-based MT and discusses the following three tasks. 1....
متن کاملEvaluating Features for Identifying Japanese-Chinese Bilingual Synonymous Technical Terms from Patent Families
In the process of translating patent documents, a bilingual lexicon of technical terms is inevitable knowledge source. It is important to develop techniques of acquiring technical term translation equivalent pairs automatically from parallel patent documents. We take an approach of utilizing the phrase table of a state-of-theart phrase-based statistical machine translation model. First, we coll...
متن کامل